Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
General text classification model combining attention and cropping mechanism
Yumeng CUI, Jingya WANG, Xiaowen LIU, Shangyi YAN, Zhizhong TAO
Journal of Computer Applications    2023, 43 (8): 2396-2405.   DOI: 10.11772/j.issn.1001-9081.2022071071
Abstract281)   HTML21)    PDF (1774KB)(142)       Save

Focused on the issue that current classification models are generally effective on texts of one length, and a large number of long and short texts occur in actual scenes in a mixed way, a General Long and Short Text Classification Model based on Hybrid Neural Network (GLSTCM-HNN) was proposed. Firstly, BERT (Bidirectional Encoder Representations from Transformers) was applied to encode texts dynamically. Then, convolution operations were used to extract local semantic information, and a Dual Channel ATTention mechanism (DCATT) was built to enhance key text regions. Meanwhile, Recurrent Neural Network (RNN) was utilized to capture global semantic information, and a Long Text Cropping Mechanism (LTCM) was established to filter critical texts. Finally, the extracted local and global features were fused and input into Softmax function to obtain the output category. In comparison experiments on four public datasets, compared with the baseline model (BERT-TextCNN) and the best performing comparison model BERT, GLSTCM-HNN has the F1 scores increased by up to 3.87 and 5.86 percentage points respectively. In two generality experiments on mixed texts, compared with the generality model — CNN-BiLSTM/BiGRU hybrid text classification model based on Attention (CBLGA) proposed by existing research, GLSTCM-HNN has the F1 scores increased by 6.63 and 37.22 percentage points respectively. Experimental results show that the proposed model can improve the accuracy of text classification task effectively, and has generality of classification on texts with different lengths from training data and on long and short mixed texts.

Table and Figures | Reference | Related Articles | Metrics